adversarial imitation
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Diffusion Imitation from Observation
Learning from Observation (LfO) aims to imitate experts by learning from state-only demonstrations without requiring action labels. Existing adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator that learns to classify agent and expert state transitions. Despite its simplicity in formulation, these methods are often sensitive to hyperparameters and brittle to train. Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework. Specifically, we employ a diffusion model to capture expert and agent transitions by generating the next state, given the current state. Then, we reformulate the learning objective to train the diffusion model as a binary classifier and use it to provide ``realness'' rewards for policy learning. Our proposed framework, Diffusion Imitation from Observation (DIFO), demonstrates superior performance in various continuous control domains, including navigation, locomotion, manipulation, and games.
Error Bounds of Imitating Policies and Environments
Imitation learning trains a policy by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understanding needs further studies. In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity. Noticed that by considering the environment transition model as a dual agent, imitation learning can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than behavioral cloning, suggesting a novel application of adversarial imitation for model-based reinforcement learning. We hope these results could inspire future advances in imitation learning and model-based reinforcement learning.
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Li, Shangzhe, Zhou, Dongruo, Zhang, Weitong
We study online adversarial imitation learning (AIL), where an agent learns from offline expert demonstrations and interacts with the environment online without access to rewards. Despite strong empirical results, the benefits of online interaction and the impact of stochasticity remain poorly understood. We address these gaps by introducing a model-based AIL algorithm (MB-AIL) and establish its horizon-free, second-order sample-complexity guarantees under general function approximations for both expert data and reward-free interactions. These second-order bounds provide an instance-dependent result that can scale with the variance of returns under the relevant policies and therefore tighten as the system approaches determinism. Together with second-order, information-theoretic lower bounds on a newly constructed hard-instance family, we show that MB-AIL attains minimax-optimal sample complexity for online interaction (up to logarithmic factors) with limited expert demonstrations and matches the lower bound for expert demonstrations in terms of the dependence on horizon $H$, precision $ε$ and the policy variance $σ^2$. Experiments further validate our theoretical findings and demonstrate that a practical implementation of MB-AIL matches or surpasses the sample efficiency of existing methods.
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Research Report (0.82)
- Instructional Material > Course Syllabus & Notes (0.68)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Diffusion Imitation from Observation
Learning from Observation (LfO) aims to imitate experts by learning from state-only demonstrations without requiring action labels. Existing adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator that learns to classify agent and expert state transitions. Despite its simplicity in formulation, these methods are often sensitive to hyperparameters and brittle to train. Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework. Specifically, we employ a diffusion model to capture expert and agent transitions by generating the next state, given the current state.
On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration
Zhou, Yirui, Liu, Xiaowei, Zhang, Xiaofeng, Zhang, Yangchun
Imitation learning (IL) (Pomerleau, 1991; Ng et al., 2000; Syed and Schapire, 2007; Ho and Ermon, 2016), a realm distinct from standard reinforcement learning (RL) (Puterman, 2014; Sutton and Barto, 2018), is independent on rewards provided by the environment. This characteristic makes IL particularly suited for numerous real-world applications (Bhattacharyya et al., 2018; Shi et al., 2019; Jabri, 2021). The general IL paradigm leverages the guidance from expert demonstrations with information of both states and actions to mimic an outstanding policy (Abbeel and Ng, 2004; Ho and Ermon, 2016; Kostrikov et al., 2020). According to the strategy of policy training, IL is divided into two main schemes based on policy training strategy: on-policy and off-policy training. The on-policy scheme (Ho and Ermon, 2016; Chen et al., 2020) is noted for its stability but requires a significant volume of samples.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China (0.04)
- Overview (0.67)
- Research Report (0.50)
Error Bounds of Imitating Policies and Environments
Imitation learning trains a policy by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understanding needs further studies. In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity. Noticed that by considering the environment transition model as a dual agent, imitation learning can also be used to learn the environment model.
Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions
Li, Chenhao, Blaes, Sebastian, Kolev, Pavel, Vlastelica, Marin, Frey, Jonas, Martius, Georg
Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining single versatile policies with controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving
Bronstein, Eli, Palatucci, Mark, Notz, Dominik, White, Brandyn, Kuefler, Alex, Lu, Yiren, Paul, Supratik, Nikdel, Payam, Mougin, Paul, Chen, Hongge, Fu, Justin, Abrams, Austin, Shah, Punit, Racah, Evan, Frenkel, Benjamin, Whiteson, Shimon, Anguelov, Dragomir
We demonstrate the first large-scale application of model-based generative adversarial imitation learning (MGAIL) to the task of dense urban self-driving. We augment standard MGAIL using a hierarchical model to enable generalization to arbitrary goal routes, and measure performance using a closed-loop evaluation framework with simulated interactive agents. We train policies from expert trajectories collected from real vehicles driving over 100,000 miles in San Francisco, and demonstrate a steerable policy that can navigate robustly even in a zero-shot setting, generalizing to synthetic scenarios with novel goals that never occurred in real-world driving. We also demonstrate the importance of mixing closed-loop MGAIL losses with open-loop behavior cloning losses, and show our best policy approaches the performance of the expert. We evaluate our imitative model in both average and challenging scenarios, and show how it can serve as a useful prior to plan successful trajectories.
- North America > United States > California > San Francisco County > San Francisco (0.24)
- North America > United States > California > San Diego County > San Diego (0.04)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (0.83)